## Loading required package: nlme
## This is mgcv 1.8-31. For overview type 'help("mgcv-package")'.
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
##
## lmList
## Package 'mclust' version 5.4.6
## Type 'citation("mclust")' for citing this R package in publications.
##
## Attaching package: 'mclust'
## The following object is masked from 'package:mgcv':
##
## mvn
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
##
## Attaching package: 'gtools'
## The following object is masked from 'package:mgcv':
##
## scat
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
##
## collapse
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'mvtnorm'
## The following object is masked from 'package:mclust':
##
## dmvnorm
##
## Attaching package: 'LaplacesDemon'
## The following objects are masked from 'package:mvtnorm':
##
## dmvt, rmvt
## The following objects are masked from 'package:gtools':
##
## ddirichlet, logit, rdirichlet
## The following object is masked from 'package:mgcv':
##
## rmvn
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] LaplacesDemon_16.1.6 mvtnorm_1.1-1 reshape2_1.4.4
## [4] dplyr_1.0.0 gtools_3.8.2 rstan_2.21.2
## [7] ggplot2_3.3.2 StanHeaders_2.21.0-5 mclust_5.4.6
## [10] lme4_1.1-23 Matrix_1.2-18 mgcv_1.8-31
## [13] nlme_3.1-148 RColorBrewer_1.1-2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5.1 lattice_0.20-41 prettyunits_1.1.1 ps_1.3.3
## [5] assertthat_0.2.1 digest_0.6.25 V8_3.2.0 R6_2.4.1
## [9] plyr_1.8.6 stats4_4.0.2 evaluate_0.14 pillar_1.4.6
## [13] rlang_0.4.7 curl_4.3 minqa_1.2.4 callr_3.4.3
## [17] nloptr_1.2.2.2 rmarkdown_2.3 splines_4.0.2 statmod_1.4.34
## [21] stringr_1.4.0 loo_2.3.1 munsell_0.5.0 compiler_4.0.2
## [25] xfun_0.15 pkgconfig_2.0.3 pkgbuild_1.1.0 htmltools_0.5.0
## [29] tidyselect_1.1.0 tibble_3.0.3 gridExtra_2.3 codetools_0.2-16
## [33] matrixStats_0.56.0 fansi_0.4.1 crayon_1.3.4 withr_2.2.0
## [37] MASS_7.3-51.6 grid_4.0.2 jsonlite_1.7.0 gtable_0.3.0
## [41] lifecycle_0.2.0 magrittr_1.5 scales_1.1.1 RcppParallel_5.0.2
## [45] cli_2.0.2 stringi_1.4.6 ellipsis_0.3.1 generics_0.0.2
## [49] vctrs_0.3.2 boot_1.3-25 tools_4.0.2 glue_1.4.1
## [53] purrr_0.3.4 processx_3.4.3 parallel_4.0.2 yaml_2.2.1
## [57] inline_0.3.15 colorspace_1.4-1 knitr_1.29
Load dataset
Check outliers
## The dataset contains 2649 patients with measured PfHRP2 and measured platelet counts from 4 studies
## Patients per study:
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 172 567 492 1418
Some data cleaning

##
## FALSE TRUE
## 6 2643
##
## Bangladesh FEAST (Uganda) Kilifi (Kenya)
## 1 8 18

## A total of 27 samples have zero PfHRP2 but more than 1000 parasites per uL
## After excluding the HRP2 outliers, the dataset contains 2622 patients with measured PfHRP2 and measured platelet counts
Overview of patient characteristics
Results for Table 1 in the paper
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 171 559 492 1400
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 0 0 227 0 0
## 1 171 332 492 1400
## study age.lower age.median age.upper
## 1 Bangladesh 23.5 30.0 45.0
## 2 FEAST (Uganda) 1.2 2.0 3.3
## 3 Kampala (Uganda) 2.2 3.3 4.6
## 4 Kilifi (Kenya) 1.4 2.4 3.7
## study hrp2
## 1 Bangladesh 171
## 2 FEAST (Uganda) 559
## 3 Kampala (Uganda) 492
## 4 Kilifi (Kenya) 1400
## study outcome
## 1 Bangladesh 26.9
## 2 FEAST (Uganda) 11.4
## 3 Kampala (Uganda) 6.7
## 4 Kilifi (Kenya) 11.1
## study platelet.25% platelet.50% platelet.75%
## 1 Bangladesh 27.0 50.0 139.0
## 2 FEAST (Uganda) 74.5 165.0 326.0
## 3 Kampala (Uganda) 49.0 96.0 169.5
## 4 Kilifi (Kenya) 64.0 111.0 215.0
## study hrp2.25% hrp2.50% hrp2.75%
## 1 Bangladesh 1082.9050 2667.0400 6127.5550
## 2 FEAST (Uganda) 0.0000 174.7100 1952.6900
## 3 Kampala (Uganda) 588.0000 1838.4000 4097.4000
## 4 Kilifi (Kenya) 418.7393 2206.7408 5071.5299
## study wbc.25% wbc.50% wbc.75%
## 1 Bangladesh 6.900 9.000 11.000
## 2 FEAST (Uganda) 8.400 11.950 18.675
## 3 Kampala (Uganda) 7.500 10.400 15.300
## 4 Kilifi (Kenya) 8.900 12.550 19.000
## study para.25% para.50% para.75%
## 1 Bangladesh 23550 148874 348540
## 2 FEAST (Uganda) 3640 37600 153680
## 3 Kampala (Uganda) 10635 42530 198540
## 4 Kilifi (Kenya) 6099 69824 316350
##
## 1 2 3
## Bangladesh 0 0 0
## FEAST (Uganda) 466 46 21
## Kampala (Uganda) 463 4 23
## Kilifi (Kenya) 1348 41 7
Correlation between the platelet count and the PfHRP2 concentration
## African sites: correlation:
##
## Pearson's product-moment correlation
##
## data: log10(dat_all$platelet[ind_Africa]) and log10(dat_all$hrp2 + 1)[ind_Africa]
## t = -31.892, df = 2449, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5690929 -0.5131183
## sample estimates:
## cor
## -0.5417058
## [1] -186.2477
## cor
## -0.54
## Bangladesh: correlation:
##
## Pearson's product-moment correlation
##
## data: log10(dat_all$platelet[!ind_Africa]) and log10(dat_all$hrp2 + 1)[!ind_Africa]
## t = -4.9404, df = 169, p-value = 1.863e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4797402 -0.2167256
## sample estimates:
## cor
## -0.3552439
## [1] -5.72988
## cor
## -0.36
Summary plot of the biomarker data

Basic data exploration: clustering with mclust
mclust is a generic Bayesian clustering algorithm (fits multivariate normals to the data)
We merge all the data into one and fit mclust
## [1] 2477
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 170 425 484 1398
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEV (ellipsoidal, equal shape) model with 4 components:
##
## log-likelihood n df BIC ICL
## -6671.425 2477 33 -13600.74 -14640.71
##
## Clustering table:
## 1 2 3 4
## 1038 140 837 462
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVE (ellipsoidal, equal orientation) model with 4 components:
##
## log-likelihood n df BIC ICL
## -3708.804 2477 20 -7573.904 -8377.639
##
## Clustering table:
## 1 2 3 4
## 66 144 1486 781

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEV (ellipsoidal, equal shape) model with 4 components:
##
## log-likelihood n df BIC ICL
## -5850.394 2477 20 -11857.08 -12957.88
##
## Clustering table:
## 1 2 3 4
## 589 853 891 144

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVE (ellipsoidal, equal orientation) model with 3 components:
##
## log-likelihood n df BIC ICL
## -4172.28 2477 15 -8461.781 -9076.909
##
## Clustering table:
## 1 2 3
## 1319 196 962

##
## 1 2 3 4
## Bangladesh 3 0 139 28
## FEAST (Uganda) 28 141 139 117
## Kampala (Uganda) 8 0 321 155
## Kilifi (Kenya) 27 3 887 481
In conclusion, the distribution of the parasite count is less easily decomposed than that of the platelet count or HRP2 concentration. The HRP2 and platelet count is the only one where the estimated break is not othorgonal to either variable.
Fitting a mixture model to platelets and HRP2
Two component mixture - not including FEAST
Main model
Analysis not including the FEAST study - only severe malaria studies
We convert the platelet counts and HRP2 measurements to log10 scale. The Platelet counts then get multiplied by minus 1: this is so that increasing values correspond to more likely severe malaria. This is because the underlying stan model uses the ordered vector type to avoid label switching problems.
## site_index_SMstudies
## 1 2 3
## 171 492 1400
## [1] 2063
## Priors on mean biomarker values:
## [,1] [,2]
## [1,] -2.39794 -1.875061
## [2,] 2.30103 3.477121
## Priors on standard deviations around biomarker values:
## [,1] [,2]
## [1,] 0.1 0.1
## [2,] 0.1 0.1
## Priors on prevalence of SM:
## [,1] [,2]
## [1,] 19 1
## [2,] 14 6
## [3,] 14 6
compile the stan model
Run the stan models - outputs are stored in Rout
We check convergence with the traceplots




## prevalence estimates for the three studies: model without correlation:
## [1] 94 71 65
## prevalence estimates for the three studies: model with correlation:
## [1] 96 73 66
## prevalence estimates for the three studies: model with correlation and weak priors:
## [1] 95 73 65
## prevalence estimates for the three studies: model with correlation and t-distributions:
## [1] 96 74 67
## mean values estimates for the 4 models:
## thetas_all
##
## not SM SM
## Platelet count 232 71
## PfHRP2 193 3392
## thetas_all_cor
##
## not SM SM
## Platelet count 222 74
## PfHRP2 189 3205
## thetas_all_cor_WP
##
## not SM SM
## Platelet count 218 74
## PfHRP2 197 3230
## thetas_all_cor_tdist
##
## not SM SM
## Platelet count 242 74
## PfHRP2 204 3202
## Standard deviation estimates for the 4 models:
##
## [,1] [,2]
## [1,] 0.2811903 0.7333359
## [2,] 0.3053538 0.4456030
##
## [,1] [,2]
## [1,] 0.3056151 0.7690742
## [2,] 0.3189634 0.4559764
##
## [,1] [,2]
## [1,] 0.3070648 0.7758202
## [2,] 0.3186704 0.4529417
##
## [,1] [,2]
## [1,] 0.2346785 0.7134718
## [2,] 0.2905635 0.4305740
## Correlation in Severe Malaria:
## [1] 0.1761821
## Correlation in not Severe Malaria:
## [1] 0.2406991

## [1] 1304
## [1] 403
Classifications under the bivariate normal model

Classifications under the bivariate t-distribution model
